AITopics | validity measure

Collaborating Authors

validity measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A new validity measure for fuzzy c-means clustering

Kim, Dae-Won, Lee, Kwang H.

arXiv.org Artificial IntelligenceJul-9-2024

ABSTRACT: A new cluster validity index is proposed for fuzzy clusters obtained from fuzzy c-means algorithm. The proposed validity index exploits inter-cluster proximity between fuzzy clusters. Inter-cluster proximity is used to measure the degree of overlap between clusters. A low proximity value refers to well-partitioned clusters. The best fuzzy c-partition is obtained by minimizing inter-cluster proximity with respect to c. Well-known data sets are tested to show the effectiveness and reliability of the proposed index.

fuzzy cluster, partition, validity index, (15 more...)

arXiv.org Artificial Intelligence

2407.06774

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.31)

Add feedback

Normalised clustering accuracy: An asymmetric external cluster validity measure

Gagolewski, Marek

arXiv.org Machine LearningSep-30-2023

There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to distinguish between methods which work well on certain task types and those that systematically underperform. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable, because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. In this paper, we argue that the commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, miss some desirable properties, e.g., they do not identify worst-case scenarios correctly or are not easily interpretable. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy these issues, we propose and analyse a new measure: a version of the optimal set-matching accuracy, which is normalised, monotonic, scale invariant, and corrected for the imbalancedness of cluster sizes (but neither symmetric nor adjusted for chance).

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2209.02935

Country:

Europe > Poland > Masovia Province > Warsaw (0.04)
Oceania > Australia (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering performance analysis using new correlation based cluster validity indices

Wiroonsri, Nathakhun

arXiv.org Machine LearningSep-23-2021

There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.

correlation, optimal number, validity measure, (12 more...)

arXiv.org Machine Learning

2109.11172

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Information-Theoretic External Cluster-Validity Measure

Dom, Byron E

arXiv.org Machine LearningDec-12-2012

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as ``external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that would be required to encode (compress) the class labels if both the encoder and decoder have free acccess to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. These estimated probabilities can be seen as a model for the class labels and their associated code length as a model cost.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1301.0565

Country:

North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > New York (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback